sir-v0 implements a compartmental SIR epidemic model and is closely based off of the implementation from Morris et al.. Note that this is not a sequential decision problem, as each episode only lasts 1 time step. This is instead a bandit problem.
Observation Space The agent observes the number of susceptible and infectious people along with the basic reproduction number of the system.
Model Dynamics The model evolves according to the standard SIR compartmental model. More details about this model can be found in Morris et al.
Action Space The default for this environment is to follow the full suppression intervention, whereby the agent selects the time to enact a strict quarantine lockdown. In the model this equates to reducing the infectivity parameter, beta, to zero for a period of time.
Reward Function The agent is incentivized to minimize the peak of the infectious population.
| Epidemic Gym |
| ### sir_multi-v0 {data-commentary-width=400} |
{=html} <div id="htmlwidget-696be2be533469d6ef78" style="width:100%;height:auto;" class="datatables html-widget"></div> <script type="application/json" data-for="htmlwidget-696be2be533469d6ef78">{"x":{"crosstalkOptions":{"key":[],"group":"SharedData4e663158"},"style":"bootstrap4","filter":"none","vertical":false,"fillContainer":false,"data":[[],[],[],[],[],[]],"container":"<table class=\"table table-striped table-hover row-border order-column display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>agent<\/th>\n <th>team<\/th>\n <th>mean<\/th>\n <th>std<\/th>\n <th>ref<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"order":[3,"desc"],"columnDefs":[{"className":"dt-right","targets":[3,4]},{"orderable":false,"targets":0}],"autoWidth":false,"orderClasses":false},"selection":{"mode":"multiple","selected":null,"target":"row","selectable":null}},"evals":[],"jsHooks":[]}</script> |
| *** |
sir_multi-v0 implements a compartmental SIR epidemic model and is closely based off of the implementation from Morris et al. This environment differs from sir-v0 in that this is a sequential decision problem. The agent decides over the course of an outbreak whether to quarantine or not; the agent does not determine an intervention at the initial time step like in sir-v0. |
| Observation Space The agent observes the number of susceptible and infectious people along with the basic reproduction number of the system. |
| Model Dynamics The model evolves according to the standard SIR compartmental model. More details about this model can be found in Morris et al. |
| Action Space The default for this environment is to follow the fixed control intervention, whereby the agent selects the strictness of the quarantine for the duration of one week. This equates to reducing the infectivity parameter by some number in the range [0, 1). The agent has an intervention budget of 8 weeks. |
| Reward Function The agent is penalized by the max amount of infectious observed over each time step. |